19 research outputs found
Self-supervised Dimensionality Reduction with Neural Networks and Pseudo-labeling
Dimensionality reduction (DR) is used to explore high-dimensional data in many applications. Deep learning techniques such as autoencoders have been used to provide fast, simple to use, and high-quality DR. However, such methods yield worse visual cluster separation than popular methods such as t-SNE and UMAP. We propose a deep learning DR method called Self-Supervised Network Projection (SSNP) which does DR based on pseudo-labels obtained from clustering. We show that SSNP produces better cluster separation than autoencoders, has out-of-sample, inverse mapping, and clustering capabilities, and is very fast and easy to use.</p
Deep Learning Multidimensional Projections
Dimensionality reduction methods, also known as projections, are frequently
used for exploring multidimensional data in machine learning, data science, and
information visualization. Among these, t-SNE and its variants have become very
popular for their ability to visually separate distinct data clusters. However,
such methods are computationally expensive for large datasets, suffer from
stability problems, and cannot directly handle out-of-sample data. We propose a
learning approach to construct such projections. We train a deep neural network
based on a collection of samples from a given data universe, and their
corresponding projections, and next use the network to infer projections of
data from the same, or similar, universes. Our approach generates projections
with similar characteristics as the learned ones, is computationally two to
three orders of magnitude faster than SNE-class methods, has no complex-to-set
user parameters, handles out-of-sample data in a stable manner, and can be used
to learn any projection technique. We demonstrate our proposal on several
real-world high dimensional datasets from machine learning
SDBM: Supervised Decision Boundary Maps for Machine Learning Classifiers
Understanding the decision boundaries of a machine learning classifier is key to gain insight on how classifiers work. Recently, a technique called Decision Boundary Map (DBM) was developed to enable the visualization of such boundaries by leveraging direct and inverse projections. However, DBM have scalability issues for creating fine-grained maps, and can generate results that are hard to interpret when the classification problem has many classes. In this paper we propose a new technique called Supervised Decision Boundary Maps (SDBM), which uses a supervised, GPU-accelerated projection technique that solves the original DBM shortcomings. We show through several experiments that SDBM generates results that are much easier to interpret when compared to DBM, is faster and easier to use, while still being generic enough to be used with any type of single-output classifie
Constructing and Visualizing High-Quality Classifier Decision Boundary Maps dagger
Visualizing decision boundaries of machine learning classifiers can help in classifier design, testing and fine-tuning. Decision maps are visualization techniques that overcome the key sparsity-related limitation of scatterplots for this task. To increase the trustworthiness of decision map use, we perform an extensive evaluation considering the dimensionality-reduction (DR) projection techniques underlying decision map construction. We extend the visual accuracy of decision maps by proposing additional techniques to suppress errors caused by projection distortions. Additionally, we propose ways to estimate and visually encode the distance-to-decision-boundary in decision maps, thereby enriching the conveyed information. We demonstrate our improvements and the insights that decision maps convey on several real-world datasets
Using multiple attribute-based explanations of multidimensional projections to explore high-dimensional data
Multidimensional projections (MPs) are effective methods for visualizing high-dimensional datasets to find structures in the data like groups of similar points and outliers. The insights obtained from MPs can be amplified by complementing these techniques by several so-called explanatory mechanisms. We present and discuss a set of six such mechanisms that explain MPs in terms of similar dimensions, local dimensionality, and dimension correlations. We implement our explanatory tools using an image-based approach, which is efficient to compute, scales well visually for large and dense MP scatterplots, and can handle any projection technique. We demonstrate how the provided explanatory views can be combined to augment each other's value and thereby lead to refined insights in the data for several high-dimensional datasets, and how these insights correlate with known facts about the data under study
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
Projection algorithms such as t-SNE or UMAP are useful for the visualization
of high dimensional data, but depend on hyperparameters which must be tuned
carefully. Unfortunately, iteratively recomputing projections to find the
optimal hyperparameter value is computationally intensive and unintuitive due
to the stochastic nature of these methods. In this paper we propose HyperNP, a
scalable method that allows for real-time interactive hyperparameter
exploration of projection methods by training neural network approximations.
HyperNP can be trained on a fraction of the total data instances and
hyperparameter configurations and can compute projections for new data and
hyperparameters at interactive speeds. HyperNP is compact in size and fast to
compute, thus allowing it to be embedded in lightweight visualization systems
such as web browsers. We evaluate the performance of the HyperNP across three
datasets in terms of performance and speed. The results suggest that HyperNP is
accurate, scalable, interactive, and appropriate for use in real-world
settings
Aprendendo projeções multidimensionais com redes neurais
Learning multidimensional projections with neural networksAprendendo projeções multidimensionais com redes neurai
Learning Multidimensional Projections with Neural Networks
In the wake of the revolution brought by Deep Learning, we believe neural networks can be leveraged as a tool in the service of dimensionality reduction (DR) for understanding large datasets with many dimensions (measurements). In this work, we present techniques for DR based on neural networks which improve over existing techniques on criteria such as scalability, dealing with unseen data, cluster separation, and ease of use, to name a few. We also present a quantitative evaluation of popular techniques, and propose novel applications that highlight the importance of DR techniques as tools for high-dimensional data analysis
Stability Analysis of Supervised Decision Boundary Maps
Understanding how a machine learning classifier works is an important task in machine learning engineering. However, doing this is for any classifier in general difficult. We propose to leverage visualization methods for this task. For this, we extend a recent technique called Decision Boundary Map (DBM) which graphically depicts how a classifier partitions its input data space into decision zones separated by decision boundaries. We use a supervised, GPU-accelerated technique that computes bidirectional mappings between the data and projection spaces to solve several shortcomings of DBM, such as accuracy and speed. We present several experiments that show that SDBM generates results which are easier to interpret, far less prone to noise, and compute significantly faster than DBM, while maintaining the genericity and ease of use of DBM for any type of single-output classifier. We also show, in addition to earlier work, that SDBM is stable with respect to various types and amounts of changes of the training set used to construct the visualized classifiers. This property was, to our knowledge, not investigated for any comparable method for visualizing classifier decision maps, and is essential for the deployment of such visualization methods in analyzing real-world classification models